perm filename THOMAS[E88,JMC] blob
sn#867138 filedate 1988-12-14 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00019 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00003 00002 %thomas[e88,jmc] Another try at AI and logic for Thomason
C00004 00003 \section{Introduction}
C00034 00004 \section{Semantic Universality}
C00040 00005 \section{The Common Sense Informatic Situation}
C00045 00006 \section{Some formalizations and their problems}
C00050 00007 \section{What new formal tools does AI require?}
C00051 00008 \noindent Rich and poor entities
C00054 00009 \section{Ability and Free Will}
C00058 00010 \noindent Knowledge and Belief
C00061 00011 \section{Three Approaches to Knowledge and Belief}
C00073 00012 \noindent Meta-epistemology
C00080 00013 \section{Philosophical problems raised by AI}
C00083 00014 \noindent Ontology
C00086 00015 \section{Remarks}
C00087 00016 \section{References}
C00088 00017 \section{Notes: not for final version}
C00092 00018 \smallskip\centerline{Copyright \copyright\ \number\year\ by John McCarthy}
C00093 00019 This section presents tentative ideas about identifying
C00099 ENDMK
C⊗;
%thomas[e88,jmc] Another try at AI and logic for Thomason
\input memo.tex[let,jmc]
\title{Artificial Intelligence, Logic and Formalizing Common Sense}
\section{Introduction}
This is a position paper about the relations among
artificial intelligence (AI), mathematical logic and the
formalization of common sense knowledge and reasoning. It also
treats problems of concern to both AI and philosophy. I thank
the editor for inviting it. The position advocated is that
philosophy can contribute to AI if it treats some of its
traditional subject matter in more detail and that this will
advance the philosophical goals also. Actual formalisms (mostly
first order languages) for expressing common sense facts are
described in the references.
One path to human-level AI is formalizing common sense
knowledge and reasoning in mathematical logic and solving
problems by logical reasoning. Its methodology requires
understanding the common sense world well enough to formalize
facts about it and ways of achieving goals in it. Basing AI on
understanding the common sense world is different from basing it
on understanding human psychology or neurophysiology. This
computer science approach to AI is complementary to approaches
that start from the fact that humans exhibit intelligence and
explore human psychology or human neurophysiology.
This article discusses the problems and difficulties, the
results so far, and some improvements in logic and logical languages
that may be required to formalize common sense. Fundamental
conceptual advances are almost certainly required. The object of the
paper is to get more help for AI from philosophical logicians. Some
of the requested help will be mostly philosophical and some will be
logical. Likewise the concrete AI approach may fertilize
philosophical logic as physics has repeatedly fertilized mathematics.
There are three reasons for AI to emphasize common sense
knowledge rather than the knowledge contained in scientific
theories.
Scientific theories represent compartmentalized
knowledge. In presenting a scientific theory, as well as in
developing it, there is a common sense pre-scientific stage. In
this stage, it is decided or just taken for granted what
phenomena are to be covered and what is the relation between
certain formal terms of the theory and the common sense world.
Thus in classical mechanics it is decided what kinds of bodies
and forces are to be used before the differential equations are
written down. In probabilistic theories, the sample space is
determined. In theories expressed in first order logic, the
predicate and function symbols are decided upon. The axiomatic
reasoning techniques used in mathematical and logical theories
depend on this having been done. However, a robot or computer
program with human-level intelligence will have to do this for
itself. To use science, common sense is required.
Once developed, a scientific theory remains imbedded
in common sense. To apply the theory to a specific problem requires
matching common sense descriptions to the terms of the theory.
As an example, a formalization of
the relation between the formula $s = {1\over 2} gt↑2$
and the facts of specific situations in which bodies fall
is discussed in (McCarthy and Hayes 1969). It uses the ``situation
calculus'' introduced in that paper.
Another reason is that common sense reasoning is required
for solving problems in the common sense world. The common sense
world, from the problem solving or goal-achieving point of view is
characterized by a different {\it informatic situation} than that
{\it within} any formal scientific theory. In the common sense informatic
situation, the reasoner doesn't know what facts are relevant to
solving his problem. Unanticipated obstacles may arise that involve
using parts of his knowledge not previously thought to be relevant.
Finally, the informal metatheory of any scientific theory
has a common sense informatic character. A chess player thinks about
king's side attacks and passed pawns in general and doesn't merely
think about the locations of pieces in the current position. A
mathematician invents the concept of a group in order to make
previously vague parallels between different domains into a precise
notion.
It might be supposed that the common sense world would admit a
conventional scientific theory, e.g. a probabilistic theory. No-one
has developed it, and AI has taken a somewhat different course that
involves nonmonotonic extensions to the kind of reasoning used in
formal scientific theories. This seems to us likely to work better.
Aristotle, Leibniz, Boole and Frege all had common sense
knowledge in mind when they discussed formal logic. However,
formalizing much of common sense knowledge and reasoning proved
elusive, and the twentieth century emphasis has been on formalizing
mathematics. Some important philosophers, e.g. Wittgenstein,
have mistakenly claimed that
common sense knowledge is unformalizable or mathematical logic is
inappropriate for doing it. This is partly a consequence of the
inadequacy of the specific collections of predicates and functions
they took as examples, partly from not formalizing nonmonotonic
reasoning, and probably also for lack of logical tools still to be
discovered. While I acknowledge this opinion, I haven't the time or
the scholarship to deal with the full range of such arguments.
Instead I will present the positive case, the problems that have
arisen, what has been done and the problems that can be forseen.
If a computer is to store facts about the world and reason
with them, it needs a precise language, and the program has to embody
a precise idea of what reasoning is allowed, i.e. of how new formulas
may be derived from old. Therefore, it was natural to try to use
mathematical logical languages to express what an intelligent computer
program knows that is relevant to the problems we want it to solve and
to make the program use logical inference in order to decide what to
do. (McCarthy 1959) contains the first proposals to use logic in AI
for expressing what a program knows and and how it should reason.
(Proving logical formulas as a domain for AI had already been
studied).
The 1959 paper said:
\begingroup\narrower\narrower
% COMMON.TEX[E80,JMC] TeX version Programs with Common Sense
%
The {\it advice taker} is a proposed program for solving problems by
manipulating sentences in formal languages. The main difference
between it and other programs or proposed programs for manipulating
formal languages (the {\it Logic Theory Machine} of Newell, Simon and
Shaw and the Geometry Program of Gelernter) is that in the previous
programs the formal system was the subject matter but the heuristics
were all embodied in the program. In this program the procedures will
be described as much as possible in the language itself and, in
particular, the heuristics are all so described.
The main advantages we expect the {\it advice taker} to have
is that its behavior will be improvable merely by making statements to
it, telling it about its symbolic environment and what is wanted from
it. To make these statements will require little if any knowledge of
the program or the previous knowledge of the {\it advice taker}. One
will be able to assume that the {\it advice taker} will have available
to it a fairly wide class of immediate logical consequence of anything
it is told and its previous knowledge. This property is expected to
have much in common with what makes us describe certain humans as
having {\it common sense}. We shall therefore say that {\it a program
has common sense if it automatically deduces for itself a sufficiently
wide class of immediate consequences of anything it is told and what
it already knows.}
\par\endgroup
The main reasons for using logical sentences extensively in AI
are better understood by researchers today than in 1959. Expressing
information in declarative sentences is far more modular than
expressing it in segments of computer program or in tables. Sentences
can be true in much wider contexts than specific programs can be
useful. The supplier of a fact does not have to understand much about
how the receiver functions or how or whether the receiver will use it.
The same fact can be used for many purposes, because the logical
consequences of collections of facts can be available.
The {\it advice taker} prospectus was ambitious in 1959, would
be considered ambitious today and is still far from being immediately
realizable. This is especially true of the goal of expressing the the
heuristics guiding the search for a way to achieve the goal in the
language itself. The rest of this paper is largely concerned with
describing what progress has been made, what the obstacles are, and
how the prospectus has been modified in the light of what has been
discovered.
The formalisms of logic have been used to differing
extents in AI, mostly much less ambitious, and we'll begin by
recounting some of them.
1. A machine may use no logical sentences --- all its
``beliefs'' being implicit in its state. Nevertheless, it is often
appropriate to ascribe beliefs and goals to the program, i.e. to
remove the above sanitary quotes, and to use a principle of
rationality --- {\it It does what it thinks will achieve its goals}.
Such ascription is discussed from somewhat different points of view
in (Dennett 1971), (McCarthy 1979) and
(Newell 1980). The advantage is that the intent of the machine's
designers and the way it can be expected to behave may be more readily
described {\it intentionally} than by a purely physical description.
The relation between the physical and the {\it intentional}
descriptions is most readily understood in simple systems that admit
readily understood descriptions of both kinds, e.g. thermostats. Some
finicky philosophers object to this, contending that unless a system
has a full human mind, it shouldn't be regarded as having any mental
qualities at all. This is like omitting the numbers 0 and 1 from the
number system on the grounds that numbers aren't required to count
sets with no elements or one element.
Indeed if your main interest is the null set or unit sets, the numbers
0 and 1 are irrelevant, but the number system loses clarity and uniformity
if they are omitted. Likewise, when one studies phenomena like belief,
e.g. because one wants a machine with beliefs and which reasons about
beliefs, it works better to consider simplified cases first.
Much more, see (McCarthy
1979a), can be said about ascribing mental qualities to machines, but
that's not where the main action is in AI.
2. The next level of use of logic involves computer programs
that use sentences in machine memory to represent their beliefs but
use other rules than ordinary logical inference to reach conclusions.
New sentences are often obtained from the old ones by ad hoc programs.
Moreover, the sentences that appear in memory belong to a
program-dependent subset of the logical language being used. Adding
certain true sentences in the language may even spoil the functioning
of the program. The languages used are often rather unexpressive
compared to first order logic, for example they may not admit
quantified sentences, or they may use a
different notation from that used for ordinary facts to represent
``rules'', i.e. certain universally quantified implication sentences.
Often rules cannot be consequences of the program's reasoning; they
must have all been put in by the ``knowledge engineer''. Sometimes
the reason programs have this form is just ignorance, but the usual
reason for the restriction is the practical one of making the program
run fast and deduce just the kinds of conclusions its designer
anticipates. Most often the implications are used in just one
direction, i.e. the contrapositive is not used. We
believe the need for such specialized inference will turn out to be
temporary and will be reduced or eliminated by improved ways of
controlling general inference, e.g. by allowing the heuristic rules to
be also expressed as sentences as promised in the above extract from
the 1959 paper.
3. The third level uses first order logic and also logical
deduction. Typically the sentences are represented as clauses, and the
deduction methods are based on J. Allen Robinson's (1965) method of
resolution. It is common to use a theorem prover as a problem solver,
i.e. to determine an $x$ such that $P(x)$ as a byproduct of a proof of
the formula $∃xP(x)$.
This level is less used for practical
purposes than level two, because techniques for controlling the
reasoning are still insufficiently developed, and it is common for the
program to generate many useless conclusions before reaching the desired
solution. Indeed, unsuccessful experience (Green 1969) with this method
led to more restricted uses of logic, e.g. the STRIPS system of (Nilsson
and Fikes 1971).
%The promise of (McCarthy 1960) to express the
%heuristic facts that should be used to guide the search as logical
%sentences has not yet been realized by anyone.
The commercial ``expert system shells'', e.g. ART, KEE and
OPS-5, use logical representation of facts, usually ground facts only,
and separate facts from rules. They provide elaborate but not always
adequate ways of controlling inference.
In this connection it is important to mention logic programming,
first introduced in Microplanner (Sussman et al., 1971)
and from different points of view by Robert Kowalski (1979) and Alain
Colmerauer in the early 1970s.
A recent text is (Sterling and Shapiro 1986). Microplanner
was a rather unsystematic collection of tools, whereas Prolog relies
almost entirely on one kind of logic programming, but the main idea
is the same. If one uses a restricted class of sentences, the so-called
Horn clauses, then it is possible to use a restricted form of logical
deduction, and the control problem are much eased, and it is possible
for the programmer to anticipate the course the deduction will take.
The price paid is that only certain kinds of facts are conveniently
expressed as Horn clauses, and the depth first search built into
Prolog is not always appropriate for the problem.
Even when the facts can be expressed, the reasoning carried
out by a Prolog program may not be appropriate. For example, the
fact that a sealed container such that all the bacteria in it
are dead is sterile and the fact that heating a can kills a
bacterium in the can can both be expressed as Prolog clauses.
However, the resulting program will for sterilizing a container
will kill each bacterium indiviually, because it will have to
index over the bacteria. It won't reason that heating the
can kills all the bacteria at once, because it doesn't do
universal generalization.
Nevertheless, expressibility in Horn clauses is an
important property of a set of facts and logic programming has
been successfully used for many applications. However, it seems
unlikely to dominate AI programming as some of its advocates
hope.
Although third level systems express both facts and rules
as logical sentences, they are still rather specialized. The axioms
with which the programs begin are not general truths about the world
but are sentences whose meaning and truth is limited to the narrow
domain in which the program has to act. For this reason, the ``facts''
of one program usually cannot be used in a database for other programs.
4. The fourth level is still a goal. It involves representing
general facts about the world as logical sentences. Once put in
a database, the facts can be used by any program. The facts would
have the neutrality of purpose characteristic of much human information.
The supplier of information would not have to understand
the goals of the potential user or how his mind works. The present
ways of ``teaching'' computer programs amount to ``education
by brain surgery''.
A major difficulty is that fourth level systems require extensions
to mathematical logic. One kind of extension is formalized {\it nonmonotonic
reasoning}, first proposed in the late 1970s (McCarthy 1977, 1980, 1986),
(Reiter 1980), (McDermott and Doyle 1980), (Lifschitz 1988).
Mathematical logic is monotonic
in the following sense. If we have $A \vdash p$ and $A ⊂ B$, then we also
have $B \vdash p$.
If the inference is logical deduction, then exactly the same
proof that proves $p$ from $A$ will serve as a proof from $B$. If the
inference is model-theoretic, i.e. $p$ is true in all models of $A$,
then $p$ will be true in all models of $B$, because the models of $B$
will be a subset of the models of $A$. So we see that the monotonic
character of traditional logic doesn't depend on the details of the
logical system but is quite fundamental.
While much human reasoning is montonic,
some important human common sense reasoning is not. We
reach conclusions from certain premisses that we would not reach if
certain other sentences were included in our premisses. For example,
learning that I own a car, you conclude that it is appropriate on a
certain occasion to ask me for a ride, but when you learn the further
fact that the car is in the garage being fixed you no longer draw that
conclusion. Some people think it is possible to try to save
monotonicity by saying that what was in your mind was not a general rule
about asking for a ride from car owners but a probabilistic rule. So
far these people have not worke out any detailed
epistemology for this approach, i.e. exactly what probabilistic
sentences should be used. Instead AI has moved to directly formalizing
nonmonotonic logical reasoning. Indeed it seems to me that
probabilistic reasoning has not been fully formalized, because
the step that corresponds to going from being told that I have
a car to concluding that it is appropriate to ask for a ride
isn't formalized in a way that allows for the subsequent addition
of the fact that the car is being fixed.
Nonmonotonic reasoning is an active field of study.
Progress is often driven by examples, e.g. the Yale shooting
problem (Hanks and McDermott 198xx), in which obvious
axiomatizations used with the available reasoning formalisms
don't seem to give the answers intuition suggests. One direction
being explored (Moore 198xx, Gelfond 198xx, Lifschitz 198xx)
involves putting facts about belief and knowledge explicitly in
the axioms --- even when the axioms concern nonmental domains.
Moore's classical (3 years old) example is ``If I had an elder
brother I'd know it.''
\section{Semantic Universality}
This section presents tentative ideas about identifying
a problem and suggestions for solving it.
If a computer program is to have human-level intelligence,
then it must be able to come up with new ideas.
We are not ready to attack that problem, but something can be
said about a necessary preliminary capability. This is the
ability to be told new ideas. To be flexible, this cannot
take the form of ``brain surgery''. That is it cannot require
a human to understand the current knowledge of
the machine and replace it by new knowledge. Instead the
new ideas must be impartable in some language by a person who
understands no more about the internal representation of
the knowledge than one human understands about how other
humans represent knowledge.
Let us suppose the internal language of the program
is a predicate calculus language, i.e. it has variables, predicates
and functions and quantifiers. Whether it is a first order
language doesn't seem important to the present discussion,
provided that if it is first order, it contains individuals
like sets that can be quantified over and can ``contain''
other individuals.
If one designs such a language with a specific domain
in mind, one is likely to do it in a way that may preclude the
subsequent communication of many important new ideas. Indeed
almost all of the languages for axiomatizing various domains
that have been used in mathematics or in AI preclude it.
The mathematician or logician interested in group theory
will cheerfully let the individuals of his formal language
be elements of the group, perhaps without even noticing
that this precludes an extension of the language to handling
several groups at once or handling rings and fields also.
This is harmless, because if he subsequently wants a language
including these entities he'll make a new one.
The situation is different when we want to extend
the utility of a language by additions rather than by
replacing the language by another one. We might begin by
having the program work in a metalanguage in which languages
are objects, but what about extensions to the metalanguage?
Mightn't we face an infinite regress?
Instead let's be bold and ask about the possibilities
of a {\it semantically universal language}. The goal is that
arbitrary new concepts could be introduced by adding sentences
in the language to an existing theory. Doubtless, this
idea can be formalized in such a way as to be refutable by
a diagonal argument. Don't waste your time by starting
with that! Another possibility is that the existence of
a semantically universal language is trivial to prove.
That seems somewhat more likely.
One part of the solution may be formalizing the notion
of context as suggested in (McCarthy 1987). Briefly, the idea
is to replace formulas like $plus(x,y,z)$ by formulas
like $holds(plus(x,y,z),c)$, where the symbol $c$ represents
a context. We get out of commitments to what a symbol like
$plus$ means by going to different context. In order not
to lose everything when the context is changed, there are
sentences relating contexts and nonmonotonic inheritance
rules relating what may be expected in a new context unless
information to the contrary is given.
To give a concrete example, suppose we have Boyle's
law relating the pressure and volume of a bit of gas, and we
want to extend it by introducing the temperature, i.e. adding
Charles's law so as to get the perfect gas law.
\section{The Common Sense Informatic Situation}
It's complex. Here is a preliminary list of features and
considerations.
1. Entities of interest are known only partially, and the
information about entities and their relations that may be relevant
to achieving goals cannot be permanently separated from irrelevant
information.
%
(Contrast this with the situation in gravitational
astronomy in which it is stated in the informal introduction that
the chemical composition and shape of a body are irrelevant to the
theory; all that counts is the body's mass, and its initial position
and velocity).
Even within gravitational astronomy, non-equational theories
arise. For example, it was recently proposed that the periodic
extinctions are caused by showers of comets induced by a companion
star to the sun encountering and disrupting the Oort cloud of comets
every time it comes to perihelion. This theory is qualitative because
neither the orbit of the hypothetical star nor those of the comets
is available.
2. The formalism has to be {\it epistemologically adequate},
a notion introduced in (McCarthy and Hayes 1970). This means that
the formalism must be capable of representing the information that
is actually available, not merely capable of representing actual
complete states of affairs.
For example, it is insufficient to have a formalism that
can represent the positions and velocities of the particles in a
gas. We can't obtain that information, our largest computers don't
have the memory to store were it available, and our fastest computers
couldn't use the information to make predictions.
As a second example, suppose we need to be able to predict
someone's behavior. The simplest example is a clerk in a store.
The clerk is a complex individual about whom a customer may know
little. However, the clerk can usually be counted on to accept
money for articles brought to the counter, wrap them as appropriate
and not protest when the customer then takes the articles from the store.
The clerk can also be counted on to object if the customer attempts
to take the articles without paying the appropriate price. Describing
this requires a formalism capable of representing information about
human social institutions. Moreover, the formalism must be capable
of representing partial information about the institution, such as
a three year old's knowledge of store clerks. For example, a three
year old doesn't know the clerk is an employee or even what that
means. He doesn't require detailed information about the clerk's
psychology, and anyway this information is not ordinarily available.
\section{Some formalizations and their problems}
(McCarthy 1986) discusses several formalizations, proposing
those based on nonmonotonic reasoning as improvements of earlier
ones. Here are some.
1. Inheritance with exceptions. Birds normally fly, but there
are exceptions, e.g. ostriches and birds whose feet are encased in
concrete. The first exception might be listed in advance, but the
second has to be derived or verified when mentioned on the basis of
information about the mechanism of flying and the properties of
concrete.
There are many ways of nonmonotonically axiomatizing the
facts about which birds can fly. The following axioms using
a predicate $ab$ standing for ``abnormal'' seem
to me quite straightforward.
%\leql{a4a:}
$$(∀x)(¬ab(aspect1(x)) ⊃ ¬flies(x)).\leql{aiva}$$
%
Unless an object is abnormal in $aspect1$, it can't fly.
It wouldn't work to write $ab(x)$ instead of $ab(aspect1(x))$,
because we don't want a bird that is abnormal with respect to its ability
to fly to be automatically abnormal in other respects. Using aspects limits
the effects of proofs of abnormality.
%leql{a5:}
$$(∀x)(bird(x) ⊃ ab(aspect1(x))).\leql{av}$$
%leql{a6:}
$$(∀x)(bird(x) ∧ ¬ab(aspect2(x)) ⊃ flies(x))\leql{avi}$$
%
Unless a bird is abnormal in $aspect2$, it can fly.
When these axioms are combined with other facts about the problem,
the predicated $ab$ is then to be circumscribed, i.e. given its minimal
extent compatible with the facts being taken into account. This has the
effect that a bird will be considered to fly unless other axioms imply
that it is abnormal in $aspect2$. \eqrev{av} is called a cancellation
of inheritance axiom, because it explicitly cancels the general presumption
that objects don't fly. This approach works fine when the inheritance
hierarchy is given explicitly. More elaborate approaches, some of which
are discussed in (McCarthy 1986) and (Haugh 198xx), are required when
hierarchies with indefinite numbers of sorts are considered.
2. (McCarthy 1986) contains a similar treatment of the effects
of moving and painting blocks using the situation calculus. Moving
and painting are axiomatized entirely separately, and there are no
axioms saying that moving a block doesn't affect the positions of other
blocks or the colors of blocks. A general ``common sense law of inertia''
%
$$(∀ p e s)(holds(p,s) ∧ ¬ab(aspect1(p,e,s)) ⊃ holds(p,result(e,s))),$$
%
asserts that a fact $p$ that holds in a situation $s$ is presumed
to hold in the situation $result(e,s)$ that results from an event
$e$ unless there is evidence to the contrary. Unfortunately, Lifschitz
(198xx) and McDermott (198xx) showed that simple treatments of
the common sense law of inertia admit nonstandard models.
Many authors have given more elaborate treatments, but in my opinion,
the results are not yet entirely satisfactory.
\section{What new formal tools does AI require?}
1. Generality and elaboration tolerance require
1. Formalisms for specific phenomena must be imbedded
in general logical languages. For example, facts about
knowledge must be expressible in the same language as facts
about the effects of actions and other events, and such facts
must be allowed objects of knowledge.
\noindent Rich and poor entities
Consider my next trip to Japan. Considered as a plan it is
a discrete object with limited detail. I do not yet even plan to
take a specific flight or to fly on a specific day. Considered as
a future event, lots of questions may be asked about it. For example,
it may be asked whether the flight will depart on time and what precisely
I will eat on the airplane. We propose characterizing the actual trip
as a rich entity and the plan as a poor entity. Originally, I thought
that rich events referred to the past and poor ones to the future, but
this seems to be wrong. It's only that when one refers to the past
one is usually referring to a rich entity, while the future entities
one refers to are more often poor. However, there is no intrinsic
association of this kind.
(McCarthy and Hayes 1969) defines situations as rich entities.
However, the actual programs that have been written to reason in
situation calculus might as well regard them as taken from a
finite or countable set of discrete states.
Rich entities are open ended in that we can always introduce
more properties of them into our discussion. Poor entities can often
be enumerated, e.g. we can often enumerate all the events that we
consider reasonably likely in a situation. The passage from considering
rich entities in a given discussion to considering poor entities is
a step of nonmonotonic reasoning.
It seems to me that it is important to get a good formalization
of the relations between corresponding rich and poor entities.
This can be regarded as formalizing the relation between the world
and a formal model of some aspect of the world, e.g. between the
world and a scientific theory.
\section{Ability and Free Will}
AI has to put the problem of free will in the following
form. What view shall we build into a robot about its own abilities,
i.e. how shall we make it reason about what it can and cannot do?
(Wishing to avoid begging any questions, by {\it reason} we mean {\it compute}
using axioms, observation sentences, rules of inference and
nonmonotonic rules of conjecture.)
Let $A$ be a task we want the robot to perform, and let $B$
and $C$ be alternate intermediate goals either of which would
allow the accomplishment of $A$. We want the robot to be able
to choose between attempting $B$ and attempting $C$. It would be
silly to program it to reason: ``I'm a robot and a deterministic
device. Therefore, I have no choice between $B$ and $C$. What
I will do is determined by my construction.'' Instead it must
decide in some way which of $B$ and $C$ it can accomplish. It
should be able to conclude in some cases that it can accomplish
$B$ and not $C$, and therefore it should take $B$ as a subgoal
on the way to achieving $A$. In other cases it should conclude
that it {\it can} accomplish either $B$ or $C$ and should choose
whichever is evaluated as better according to the criteria we
provide it.
(McCarthy and Hayes 1969) contains proposals for the
formalism within which the robot should reason. The essential
idea is that what the robot can do is determined by the place
the robot occupies in the world --- not by its internal structure.
For example, if a certain sequence of outputs from the robot will
achieve $B$, then we conclude or it concludes that the robot
can achieve $B$ without reasoning about whether the robot will
actually produce that sequence of outputs.
Our contention is that this is approximately how any
system, whether human or robot, must reason about ability
achieve goals. The basic formalism will be the same, regardless
of whether the system is reasoning about its own abilities
or about those of other systems including people.
The above-mentioned paper also discusses the complexities
that come up when a strategy is required to achieve the goal and
when internal inhibitions or lack of knowledge have to be taken
into account.
\noindent Knowledge and Belief
Formalizing the facts about knowledge and belief has been
an active field ever since (Hintikka 1962). Here are some of its
features.
1. Most of the formalizations have taken the form of modal
propositional logics. These are useful for studying properties of
knowledge and belief in isolation from other phenomena, but it isn't
apparent how to imbed them in a general common sense framework. In such a
framework it should be possible to infer the plausibility of a plan to
hide something in order to prevent another person from finding out its
existence or location.
Even the simple puzzle of the three wise men with spots
on their foreheads requires going beyond the simple modal formalisms.
It requires the ability to assert and prove non-knowledge. It must
be assertable that the three wise men initially know only what the
king has told them. Then it must be inferrable that after they have
answered whether they know the colors of their spots, they still don't
know the colors of their own spots. This requires knowledge as a
function of time or of situations that result from events.
2. The modal formalizations also treat only ``knowing that'',
omitting ``knowing what'', ``knowing how'' and ``knowing about''.
%\section{Philosophical, Philosophical Logical and AI Aproaches to Knowledge
%and Belief}
\section{Three Approaches to Knowledge and Belief}
This section contrasts the approaches to knowledge and
belief characteristic of philosophy, philosophical logic and
artificial intelligence. Knowledge and belief have long been
studied in epistemology, philosophy of mind and in philosophical
logic. Since about 1960, knowledge and belief have also been
studied in AI.
It seems to me that philosophers have generally treated
knowledge and belief as {\it complete natural kinds}. According
to this view there is are facts to be discovered about what
beliefs are. Moreover, once it is decided what the objects of
belief are (e.g. sentences or propositions), the definitions of
belief ought to determine for each such object $p$ whether the
person believes it or not. This last is the completeness mentioned
above. Of course, only human and sometimes animal beliefs have
mainly been considered. Philosophers have differed about whether
machines can ever be said to have beliefs, but even those who admit
the possibility of machine belief consider that what beliefs are
is to be determined by examining human belief.
The formalization of knowledge and belief has been studied
as part of philosophical logic, certainly since Hintikka's book (196xx),
but much of the earlier work in modal logic can be seen as applicable.
Different logics and axioms systems sometimes correspond to the
distinctions that less formal philosophers make, but sometimes the
mathematics dictates different distinctions.
AI takes a different course because of its different objectives,
but I'm inclined to recommend this course to philosophers also, partly
because we want their help but also because of I think it has
philosophical advantages.
The first question AI asks is: Why study knowledge and belief
at all? Does a computer program solving problems and achieving goals
in the common sense world require beliefs, and must it use sentences
about beliefs. The answer to both questions is approximately yes. At
least there have to be data structures whose usage corresponds closely
to human usage in some cases. For example, a robot that could use
the American air transportation system has to know that travel agents
know airline schedules, that there is a book (and now a computer
accessible database) called the OAG that contains this information.
If it is to be able to plan a trip with intermediate stops it has
to have the general information that the departure gate from an
intermediate stop is not to be discovered when the trip is first
planned but will be available on arrival at the intermediate stop.
If the robot has to keep secrets, it has to know about how information
can be obtained by inference from other information, i.e. it has
to have some kind of information model of the people from whom
it is to keep the secrets.
However, none of this tells us that the notions of
knowledge and belief to be built into our computer programs must
correspond to the the goals philosophers have been trying to
achieve. For example, the difficulties involved in building a
system that knows what travel agents know about airline schedules
are not substantially connected with questions about how the
travel agents can be absolutely certain. Its notion of knowledge
doesn't have to be complete; i.e. it doesn't have to determine
in all cases whether a person is to be regarded as knowing a
given proposition. For many tasks it doesn't have to have
opinions about when true belief doesn't constitute knowledge.
The designers of AI systems can try to evade philosophical
puzzles rather than solve them.
Maybe some people would suppose that if the question
of certainty is avoided, the problems become easy. That has
not been our experience.
As soon as we try to formalize the simplest puzzles involving
knowledge, we encounter difficulties that philosophers have rarely
if ever attacked.
Consider the following puzzle of Mr. S and Mr. P.
{\it Two numbers $m$ and $n$ are chosen such that $2 ≤ m ≤ n ≤ 99$.
Mr. S is told their sum and Mr. P is told their product. The following
dialogue ensues:}
\halign{#\hfil\cr
\it Mr.~P: I don't know the numbers.\cr
Mr.~S: I knew you didn't know. I don't know either.\cr
Mr.~P: Now I know the numbers.\cr
Mr.~S: Now I know them too.\cr
In view of the above dialogue, what are the numbers?\cr}
Formalizing the puzzle is discussed in (McCarthy 1989).
For the present we mention only the following aspects.
1. We need to formalize {\it knowing what}, i.e. knowing what
the numbers are, and not just {\it knowing that}.
2. We need to be able to express and prove non-knowledge as well as
knowledge. Specifically we need to be able to express the fact that as
far as he knows, the numbers might be any pair of factors of the known
product.
3. We need to express the joint knowledge of Mr. S and Mr. P of
the conditions of the problem.
4. We need to express the change of knowledge with time, e.g.
how Mr. P's knowledge changes when he hears Mr. S say that he knew that
Mr. P didn't know the numbers and doesn't know them himself.
The first order langage used to express the facts of this
problem involves an accessibility relation $A(w1,w2,p,t)$,
modeled on Kripke's semantics for modal logic. However, the
accessibility relation here is in the language itself rather than
in a metalanguage. Here $w1$ and $w2$ are possible worlds, $p$
is a person and $t$ is an integer time. The use of possible
worlds makes it convenient to express non-knowledge. Assertions
of non-knowledge are expressed as the existence of accessible
worlds satisfying appropriate conditions.
The problem was successfully expressed in the language
in the sense that an arithmetic condition determining the values
of the two numbers can be deduced from the statement. However, this
is not good enough for AI. Namely, we would like to include facts
about knowledge in a general purpose common sense database. Instead
of an {\it ad hoc} formalization of Mr. S and Mr. P, the problem
should be solvable from the same general facts about knowledge that
might be used to reason about the knowledge possessed by travel agents
supplemented only by the facts about the dialog. Moreover, the
language of the general purpose database should accomodate all
the modalities that might be wanted and not just knowledge. This
suggests using ordinary logic, e.g. first order logic, rather than
modal logic, so that the modalities can be ordinary functions or
predicates rather than modal operators.
Suppose we are successful in developing a ``knowledge formalism''
for our common sense database that enables the program controlling
a robot to solve puzzles and plan trips and do the other tasks that
arise in the common sense environment requirin reasoning about knowledge.
It will surely be asked whether it is really {\it knowledge} that
has been formalized. I doubt that the question has an answer.
This is perhaps the question of whether knowledge is a natural kind.
I suppose some philosophers would say that such problems are
not of philosophical interest. It would be unfortunate, however, if
philosophers were to abandon such a substantial part of epistemology
to computer science. This is because the analytic skills that
philosophers have acquired are relevant to the problems.
\noindent Meta-epistemology
% meta[s88,jmc] Message to AILIST on metaepistemology
% meta[e85,jmc] Meta-epistemology
% metaep[f82,jmc] A proposal for meta-epistemology
If we are to program a computer to think about its own
methods for gathering information about the world, then it needs
a language for expressing assertions about the relation between
the world, the information gathering methods available to an
information seeker and what it can learn. This leads to a subject
I like to call meta-epistemology. Besides its potential applications
to AI, I believe it has applications to philosophy considered in
the traditional sense.
Meta-epistemology is proposed as a mathematical theory
in analogy to metamathematics. Metamathematics considers the
mathematical properties of mathematical theories as objects.
In particular model theory as a branch of metamathematics deals
with the relation between theories in a language and interpretations
of the non-logical symbols of the language. These interpretations
are considered as mathematical objects, and we are only sometimes
interested in a preferred or true interpretation.
Meta-epistemology considers the relation between the world,
languages for making assertions about the world, notions of what
assertions are considered meaningful, what are accepted as rules
of evidence and what a knowledge seeker can discover about the
world. All these entities are considered as mathematical objects.
In particular the world is considered as a parameter.
Thus meta-epistemology has the following characteristics.
1. It is a purely mathematical theory. Therefore, its
controversies, assuming there are any, will be mathematical
controversies rather than controversies about what the real world
is like. Indeed metamathematics gave many philosophical issues
in the foundations of mathematics a technical content. For
example, the theorem that intuitionist arithmetic and Peano
arithmetic are equi-consistent removed at least one area of
controversy between those whose mathematical intuitions support
one view of arithmetic or the other.
2. While many modern philosophies of science assume some
relation between what is meaningful and what can be verified or
refuted, only special meta-epistemological systems will have the
corresponding mathematical property that all aspects of the world
relate to the experience of the knowledge seeker.
This has several important consequences for the programming a
knowledge seeker.
1. A knowledge seeker must have no a priori prejudices (principles)
about what concepts might be meaningful. Whether and how a proposed concept
about the world
might ever connect with observation may remain in suspense for a very
long time while the concept is investigated and related to other concepts.
We illustrate this by a literary example. Moli\'re's play
{\it La Malade Imaginaire} includes a doctor who explains sleeping
powders by saying that they contain a ``dormitive virtue''. In the
play, the doctor is considered a pompous fool for offering a concept
that explains nothing. However, suppose the doctor had some intuition
that the dormitive virtue might be extracted and concentrated, say
by shaking the powder in a mixture of ether and water. Suppose he
thought that he would get the same concentrate from all substances
with soporific effect. He would certainly have a fragment of
scientific theory subject to later verification. Now suppose less ---
namely, he only believes that a common component is behind all
substances whose consumption makes one sleepy but has no idea
that he should try to invent a way of verifying the conjecture.
He still has something that, if communicated to someone more
scientifically minded, might be useful. In the play, the doctor
obviously sins intellectually by claiming a hypothesis as certain.
2. A knowledge seeker must be able to form new concepts
with only extremely tenuous relations with its previous linguistic
structure.
\section{Philosophical problems raised by AI}
In this paper we will ignore the controversies about
whether the term ``artificial intelligence'' is a violation of
``logical grammar'' (Shanker 1988), whether intelligence is necessarily an
attribute of biological systems (Dreyfuss 198xx, Searle 198xx).
I have expressed opinions
on these subjects in (McCarthy 19xx). I will only say that
I have found the arguments difficult to understand in one
important respect --- whether the authors are making an assertion
about what tasks human-built robots will eventually be able to perform
or only about the proper language to use in characterizing that
performance.
Practical experience and theoretical study of how to write
intelligent computer programs has led to the following propositions.
I don't claim to have proved them.
1. Epistemology. One studies what a program with given initial
knowledge and a given strategy for learning will find out about its
environment as a function of the environment, the strategy and what it
already knows. This is the AI approach to epistemology. Put in this
naive scientific way, the proposition seems obvious, but it is important
to note that it involves epistemological commitments. There is a world
whose properties are not determined by those of the program learning about
it. One studies the improvement of knowledge rather than starting with a
blank slate.
\noindent Ontology
We accept Quine's idea that one's ontology is
expressed by the range of the variables in the formalism. However,
it is not a priori obvious what the ontology should be. In particular,
nominalistic arguments aimed at restricting the ontology are often
faulty, because of a lack of imagination about what entities need to
be reasoned about. For example, we need propositions of some kind,
because we want to say about someone that he has some false beliefs
about a certain subject. As another example, we need to be able to
say that there are two things wrong with the boat; it has a lack of
oars and it has a leak. We need to enumerate things wrong with the
boat, so having a leak and water rushing in are one thing wrong and
not two.
We need to distinguish between concepts specified in definitions
and natural kinds. The aspect of natural kinds important here is that
an agent regards them as objective. There is more to be learned about
what distinguishes objects of this kind from other objects. It is also
important that there isn't in common experience a continuous gradation
of fruit between lemons and oranges. This allows us, our children and
our programs to believe in the existence of a sharp distinction between
oranges and lemons while simultaneously being unable to give the
distinction.
Here are some of the kinds of entities that have been used
in AI.
1. Situations and states.
2. Events
$$s' = result(e,s)$$
%
is the situation that results when the event $e$ occurs in the situation $s$.
\section{Remarks}
1. Perhaps we can mollify John Searle and other philosophers
by referring to weak beliefs, pseudo-beliefs or imitation
beliefs. Maybe we should also agree to put into every computer
program the pseudo-belief: We computer programs have only
pseudo-beliefs, while our human masters have real beliefs.
But then we would have to figure out how a mere machine
could distinguish real beliefs from pseudo-beliefs.
\section{References}
{\bf McCarthy, John (1987)}:
``Generality in Artificial Intelligence'', {\it Communications of the ACM}.
Vol. 30, No. 12, pp. 1030-1035
% genera[w86,jmc]
{\bf Russell, Bertrand (1913)}: ``On the Notion of Cause'',
{\it Proceedings of the Aristotelian Society}, 13, pp. 1-26.
\section{Notes: not for final version}
1. List of what AI needs
contexts, nonmonotonic, elaboraton tolerance, approximate theories
avoiding Montague's paradoxes of intentionality, rich entities,
frame and qualification problems, meta-epistemology
causality
2. slogans
modality si, modal logic no
3. Common sense database
Why common sense knowledge rather than scientific knowledge?
Russell quote.
``All philosophers, of every school, imagine that causation is one
of the fundamental axioms or postulates of science, yet, oddly enough,
in advanced sciences such as gravitational astronomy, the word `cause'
never occurs $\ldots$. The law of causality, I believe, like much that passes
muster among philosophers, is a relic of a bygone age, surviving, like the
monarchy, only because it is erroneously supposed to do no harm $ldots$.''
-------
daedal.2[f87,jmc]
4. pedigrees
5. advice about reasoning
6. epistemological adequacy
7. frame problem,
Pylyshyn, Zenon W. (Ed.), {\it The Robot's Dilemma: The Frame Problem
in Artificial Intelligence} (Ablex, Norwood, NJ 1987)
Smoliar, Stephen W.Review of {\it The Robot's Dilemma}, {\it Artificial
Intelligence} vol 36 Number 1, August 1988
8. Avoid misunderstanding. There is no objection to the delimitation
of what is taken into account. That is necessary in common sense
reasoning also. The point is that the delimitation itself must be
a formal process, e.g. circumscription. Moreover, it must be reversible
if the limited theory proves inadequate.
Technical problems
Montagues paradoxes
Quantification under modal operators
Godel showed that provability could be formalized in principle, but he
didn't need to make it convenient, because his goals could be accomplished
by non-formalized meta-level reasoning about what formalized object level
reasoning would accomplish.
∃x believes(Ralph,spy x)
This is ok provided we consider the statement to be made in a context in
which there is some way of specifying what Ralph would take as a
concept of x.
nonmon needed to form proposition to be given a probability
Dec 7
The AI point of view on philosophical problems.
What mental qualities shall we build?
Concentrate on the easy cases. There is plenty of trouble
there.
Common sense database is the key problem
\smallskip\centerline{Copyright \copyright\ \number\year\ by John McCarthy}
\smallskip\noindent{This draft of thomas[e88,jmc]\ TEXed on \jmcdate\ at \theTime}
\vfill\eject\end
This section presents tentative ideas about identifying
a problem and suggestions for solving it.
If a computer program is to have human-level intelligence,
then it must be able to come up with new ideas.
We are not ready to attack that problem, but something can be
said about a necessary preliminary capability. This is the
ability to be told new ideas. To be flexible, this cannot
take the form of ``brain surgery''. That is it cannot require
a human to understand the current knowledge of
the machine and replace it by new knowledge. Instead the
new ideas must be impartable in some language by a person who
understands no more about the internal representation of
the knowledge than one human understands about how other
humans represent knowledge.
Let us suppose the internal language of the program
is a predicate calculus language, i.e. it has variables, predicates
and functions and quantifiers. Whether it is a first order
language doesn't seem important to the present discussion,
provided that if it is first order, it contains individuals
like sets that can be quantified over and can ``contain''
other individuals.
If one designs such a language with a specific domain
in mind, one is likely to do it in a way that may preclude the
subsequent communication of many important new ideas. Indeed
almost all of the languages for axiomatizing various domains
that have been used in mathematics or in AI preclude it.
The mathematician or logician interested in group theory
will cheerfully let the individuals of his formal language
be elements of the group, perhaps without even noticing
that this precludes an extension of the language to handling
several groups at once or handling rings and fields also.
This is harmless, because if he subsequently wants a language
including these entities he'll make a new one.
The situation is different when we want to extend
the utility of a language by additions rather than by
replacing the language by another one. We might begin by
having the program work in a metalanguage in which languages
are objects, but what about extensions to the metalanguage?
Mightn't we face an infinite regress?
Instead let's be bold and ask about the possibilities
of a {\it semantically universal language}. The goal is that
arbitrary new concepts could be introduced by adding sentences
in the language to an existing theory. Doubtless, this
idea can be formalized in such a way as to be refutable by
a diagonal argument. Don't waste your time by starting
with that! Another possibility is that the existence of
a semantically universal language is trivial to prove.
That seems somewhat more likely.
One part of the solution may be formalizing the notion
of context as suggested in (McCarthy 1987). Briefly, the idea
is to replace formulas like $plus(x,y,z)$ by formulas
like $holds(plus(x,y,z),c)$, where the symbol $c$ represents
a context. We get out of commitments to what a symbol like
$plus$ means by going to different context. In order not
to lose everything when the context is changed, there are
sentences relating contexts and nonmonotonic inheritance
rules relating what may be expected in a new context unless
information to the contrary is given.
To give a concrete example, suppose we have Boyle's
law relating the pressure and volume of a bit of gas, and we
want to extend it by introducing the temperature, i.e. adding
Charles's law so as to get the perfect gas law.